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Abstract 

The optimal decoder achieving the outage capacity under imperfect channel estimation is investi- 
gated. First, by searching into the family of nearest neighbor decoders, which can be easily implemented 
on most practical coded modulation systems, we derive a decoding metric that minimizes the average 
of the transmission error probability over all channel estimation errors. Next, we specialize our general 
expression to obtain the corresponding decoding metric for fading MIMO channels. According to the 
notion of estimation-induced outage (EIO) capacity introduced in our previous work and assuming no 
channel state information (CSI) at the transmitter, we characterize maximal achievable information rates, 
using Gaussian codebooks, associated to the proposed decoder. In the case of uncorrected Rayleigh 
fading, these achievable rates are compared to the rates achieved by the classical mismatched maximum- 
likelihood (ML) decoder and the ultimate limits given by the EIO capacity. Numerical results show 
that the derived metric provides significant gains for the considered scenario, in terms of achievable 
information rates and bit error rate (BER), in a bit interleaved coded modulation (BICM) framework, 
without introducing any additional decoding complexity. 

Index Terms 

Fading channels, Maximum likelihood estimation, Information rates, Decoding, MIMO systems. 
1 The material in this paper was published in part at the International Symposium on Information Theory (ISIT07). 
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I. Introduction 

Consider a practical wireless communication system, where the receiver disposes only of noisy 
channel estimates that may in some circumstances be poor estimates, and these estimates are 
not available at the transmitter. This constraint constitutes a practical concern for the design 
of such communication systems that, in spite of their knowledge limitations, have to ensure 
communications with a prescribed quality of service (QoS). This QoS requires to guarantee 
transmissions with a given target information rate and small error probability, no matter which 
degree of accuracy estimation arises during the transmission. The described scenario addresses 
two important questions: (i) What are the theoretical limits of reliable transmission rates, using the 
best possible decoder in presence of imperfect channel state information at the receiver (CSIR) 
and (ii) how those limits can be achieved by using practical decoders in coded modulation 
systems ? Of course, these questions are strongly related to the notion of capacity that must take 
into account the above mentioned constraints. 

We have addressed in [1] the first question (i), for arbitrary memory less channels, by 
introducing the notion of Estimation-Induced Outage Capacity (EIO capacity). This novel notion 
characterizes the information-theoretic limits of such scenarios, where the transmitter and receiver 
strive to construct codes for ensuring the desired communication service, no matter which 
degree of accuracy estimation arises during the transmission. The explicit expression of this 
capacity allows one to evaluate the optimal trade-off between the maximal achievable outage 
rate (i.e. maximizing over all possible transmitter-receiver pairs) versus the outage probability 
7 QoS (the QoS constraint). This can be used by a system designer to optimally share the available 
resources (e.g. power for transmission and training, the amount of training used, etc.), so that the 
communication requirements be satisfied. Nevertheless, the theoretical decoder used to achieve 
the latter capacity cannot be implemented on practical communication systems. 

The second question (ii) concerning the derivation of a practical decoder, which can achieve 
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information rates close to the EIO capacity, is addressed in this paper. Classically, one replaces 
the exact channel by its estimate in the decoding metric. This is known as mismatched maximum- 
likehood (ML) decoding. However, this scheme is not appropriate in presence of channel 
estimation errors (CEE), at least if the estimation errors are large, i.e. for small number of 
training symbols [2]. This problem has recently motivated a lot of work. In [3] and [4] the 
authors analyze bit error rate (BER) performances of this mismatched decoder in the case of 
an orthogonal frequency division multiplexing (OFDM) system. References [5] considered a 
training-based MIMO system and showed that for compensating the performance degradation 
due to CEE, the number of receive antennas should be increased, which may become a limiting 
item for mobile applications. On the other hand, the performance of Bit Interleaved Coded 
Modulation (BICM) over fading MIMO channels with perfect CSI was studied for instance, in 
[6], [7] and [8]. Cavers in [9], derived a tight upper bound on the symbol error rate of pilot 
symbol assisted modulation (PSAM) for a 16-QAM constellation. A similar investigation was 
carried out in [10] showing that for iterative decoding of BICM at low SNR, the quality of 
channel estimates is too poor for being used in the mismatched ML decoder. 

As an alternative to the aforementioned decoder, Tarokh et al. in [11] and Taricco and Biglieri 
in [2], proposed an improved ML detection metric and applied it to a space-time coded MIMO 
system, where they showed the superiority of this metric in terms of BER. Interestly enough, this 
decoding metric can be formally derived as a special case of the general framework presented in 
this paper. So far, most of the research in the field were focused on evaluating the performances 
of mismatched decoders in terms of BER (cf. [12]), but still not providing an answer to the 
question (ii). In [13], the authors investigate achievable rates of a weighting nearest-neighbor 
decoder for multiple-antenna channel. Moreover, in [14] and [1], authors show that the achievable 
rates using the mismatched ML decoding are largely sub-optimal (at least for a limited number 
of training symbols) compared to the ultimate limits given by the EIO capacity. In this paper, 
according to the notion of EIO capacity, we investigate the maximal achievable information rate 

July, 2007 DRAFT 



4 

with Gaussian codebooks of the improved decoder in [2], [11]. Furthermore, it can be shown 
that this decoder achieves the capacity of a composite (more noisy) channel. 

This paper is organized as follows. Section [III briefly reviews our notion of capacity. Then, 
we search into the family of decoders that can be easily implemented on most practical coded 
modulation systems to derive the general expression of the decoder. This decoder minimizes the 
average of the transmission error probability over all CEE. We accomplish this by exploiting 
the availability of the statistic characterizing the quality of channel estimates, i.e., the a 
posteriori probability density function (pdf) of the unknown (true) channel conditioned on 
its estimate. Section Hn] describes the fading MIMO model. In section [TV] we specialize our 
expression of the decoding metric for the case of MIMO channels and use this for iterative 
decoding of MIMO-BICM. In section |Vj we compute achievable information rates of a receiver 
using the proposed decoder and compare these to the EIO capacity and the achievable rates 
of the classical mismatched approach. Section [VI] illustrates via simulations, conducted over 
uncorrected Rayleigh fading, the performance of the improved decoder in terms of achievable 
outage rates and BER, compared to those provided by the mismatched ML decoding. 

Notational conventions are as follows. Upper and lower case bold symbols are used to denote 
matrices and vectors; 1^ represents an (M x M) identity matrix; Ex{-} refers to expectation 
with respect to the random vector X; | ■ | and || ■ \\p denote matrix determinant and Frobenius 
norm, respectively; (-) T and (-)t denote vector transpose and Hermitian transpose, respectively. 

II. Decoding under Imperfect Channel Estimation 

Throughout this section we focus on deriving a practical decoder for general memoryless 
channels that achieves information rates close to the EIO capacity (the ultimate bound). 

A. Communication Model Under Channel Uncertainty 

A specific instance of the memoryless channel is characterized by a transition probability 
W(y\x,9) G W e with an unknown channel state 9, over input and output alphabets 9£ . 
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Here, W e = {W(-|a;, 9) : x £ 8£ ', 9 £ 0} is a family of conditional pdf parameterized by 



the vector of parameters 9 £ C C d , where d denotes the number of parameters. Throughout 
the paper we assume that the channel state, which neither the transmitter nor the receiver know 
exactly, remains constant within blocks of symbols, related to the product of the coherence time 
and the coherence bandwidth of a wireless channel, and these states for different blocks are 
i.i.d. 9 ~ ip(9) (e.g. block Rayleigh fading). The transmitter does not know 9 and the receiver 
only knows an estimate 9 and a characterization of the estimator performance in terms of the 
conditional pdf ip(9\9) (obtained by using W e , the estimation function and ip(9)). A decoder 
using 9, instead of 9, obviously might not support an information rate R (even small rates might 
not be supported if 9 and 9 are strongly different). Consequently, outage events induced by 
CEE will occur with a certain probability 7 QoS . The scenario underlying these assumptions is 
motivated by current wireless systems, where the coherence time for mobile receivers may be too 
short to permit reliable estimation of the fading coefficients and in spite of this fact, the desired 
communication service must be guaranteed. This leads to the following notion of capacity. 

B. A Brief Review of EIO Capacity 

A message m £ M = {1, . . . , [exp(nR)\} is transmitted using a pair (<p,4>) of mappings, 
where ^ : M h is the encoder, and <p : ( ¥ n x 9 k M is the decoder (that utilizes 9). 
The random rate, which depends on the unknown channel realization 9 through its probability 
of error, is given by n~ l \ogM e q. The maximum error probability (over all messages) 



R > is (e, 7 QoS ) -achievable if for every 5 > and every sufficiently large n there exists a 
sequence of length-n block codes such that the rate satisfies the quality of service 




(1) 



where y = (yx, . . . ,y n ). For a given channel estimate 9, and < e, 7 QoS < 1, an outage rate 




(2) 
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where A t (R,9) = [9 G A^ : n _1 logM ^ > R — 5} stands for the set of all channel states 
allowing for the desired transmission rate R, and A^ 71 ' = {0 G 6: emax(y>, 0, 0; #) < e} is the 
set of all channel states allowing for reliable decoding (arbitrary small error probability). This 
definition requires that maximum error probabilities larger than e occur with probability less than 
7 QoS . The practical advantage of such definition is that for (1 — 7 QoS )% of channel estimates, 
the transmitter and receiver strive to construct codes for ensuring the desired communication 
service. The EIO capacity is then defined as the largest (e, 7qoS ) -achievable rate, for an outage 
probability 7qoS and a given channel estimate 9, as 

C( 7QoS ^) = limsup{i?>0: Pr (A e (R, 9)\9) > 1 - 7qoS }> (3) 

where the maximization is taken over all encoder and decoder pairs. In [1], we proved the 
following coding Theorem that provides an explicit way to evaluate the maximal outage rate © 
versus outage probability 7(?oS for an estimate 9, characterized by ip(9\9). 

Theorem 2.1: Given an outage probability < 7qoS < 1, the EIO capacity is given by 

C b«*> § ) = P 3??„ SUP M J ( P '^(-|^))> W 

^6^ r ^ ) Ac0 . p r ( A |<?)>i— y Q ^ A 

where /(•) denotes the mutual information of the channel W(y\x, 9) and ^r(^) is the set of 
input distributions that does not depend on 9, satisfying the input constraint J g(x)dP(x) < T 
for a nonnegative cost function g : 3£ — > [0, oo). 

The existence of a decoder in © achieving the capacity © is proved using a random-coding 
argument, based on the well-known method of typical sequences [15]. Nevertheless, this decoder 
cannot be implemented on practical communication systems. 

C. Derivation of a Practical Decoder Using Channel Estimation Accuracy 

We now consider the problem of deriving a practical decoder that achieves the capacity ©. 
Assume that we restrict the searching of decoding functions 0, maximizing ©, to the class of 
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additive decoding metrics, which can be implemented on realistic systems. This means that for 
a given channel output y = (yi, . . . , y n ), we set the decoding function 

My, 0) = arg min D" (<p(m),y\0) , (5) 

mGJVt 

where 2) n (x,y|#) = n~ l YJLi 25 rf) and D : x <& x 6 i-> R> is an arbitrary per- 
letter additive metric. Consequently, the maximization in © is actually equivalent to maximizing 
over all decoding metrics D. Note, however, that this restriction does not necessarily lead to an 
optimal decoder achieving the capacity. 

Problem statement: In order to find the optimal decoding metric T> maximizing the outage 
rates in ©, for a given outage probability 7 QoS and channel estimate 9, it is necessary to look 
at the intrinsic properties of the capacity definition. Observe that the size of the set of all 
channel states allowing for reliable decoding A e is determined by the decoding function <p. 
The maximal achievable rate R, constrained to the outage probability (0), is thus limited by this 
size. Hence, for a given decoder 0, there exists an optimal set A* C Ae of channel states with 
conditional probability larger than 1 — 7 QoS , providing the largest achievable rate, which follows 
as the minimal instantaneous rate for the worst 9 £ A*. The optimal set A* is equal to the set 
A* maximizing the expression ©. Hence, an optimal decoding metric must guarantee minimum 
error probability (QQ) for every 9 £ A*. 

The computation of such a metric becomes very difficult (not necessary feasible by using 
the class of decoders in ©), since the maximization in © by using (px> is not an explicit 
function of D. However, it is interesting to note, that if the set A* defines a compact and 
convex set of channels Wa», then the optimal decoding metric can be chosen as the ML decoder 
D*(x, y\9) = — log W(y\x, 9*), where 9* is the channel state minimizing the mutual information 
in ©. The receiver can thus be a ML receiver with respect to the worst channel in the family 
[16]. However, in most practical cases, the channel states are represented by vectors of complex 
coefficients that do not lead to convex sets of channels. 
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Optimal decoder for composite channels: Instead of trying to find an optimal decoding metric 
minimizing the error probability (OQ) for every 9 G A*, we propose to look at the decoding metric 
minimizing the average of the transmission error probability over all CEE. This means, 

2) M = argmin / e£l(<p, <f>v, 0; 0)#(0|0), (6) 

where emix is obtained by replacing © in (OQ)- Since the channel W is memoryless, the average 
of error probability in © can be written as the error probability of a composite (more noisy) 
channel W(y\x, 9). This channel follows as the average of the unknown channel W over all CEE 
given the estimate 9. Then, by taking the logarithm of this channel we obtain its ML decoder, 
which minimizes (for n sufficiently large) the error probability in ©. Actually, by following an 
analogy with the proof in [16], it can be shown that 

V M (x,y\9) = -\ogW(y\x,9) with W(y\x,9) = [ W(y\x t 6)dif>(9\9). (7) 

Je 

Remark: We emphasize that this decoder cannot guarantee small error probabilities for every 
channel state 9 E A*, and consequently it only achieves a lower bound of the EIO capacity 
©. Nevertheless, this archives the capacity of the composite channel. The remaining question 
to answer is how much lower are the achievable outage rates using the metric ©, comparing 
to the theoretical decoder achieving the EIO capacity. In section |Vl we evaluate © and its 
achievable information rates for the fading MIMO channel with no CSI at the transmitter. 

III. System Model 

A. Fading MIMO Channel 

We consider a single-user MIMO system with Mt transmit and Mr receiver antennas 
transmitting over a frequency non-selective channel and refer to it as a MIMO channel. Fig. Q] 
depicts the BICM coding scheme used at the transmitter. The binary data sequence b is encoded 
by a non-recursive and non-systematic convolutional (NRNSC) code, before being interleaved 
by a quasi-random interleaver. The output bits d are gathered in subsequences of B bits and 
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mapped to complex M-QAM (M = 2 B ) vector symbols x with average power = P. We 

Mt 

also send some pilot symbols at the beginning of each data frame for channel estimation. The 
symbols of a frame are then multiplexed for being transmitted through M T antennas. Assuming 
a frame of L transmitted symbols associated to each channel matrix H fe , the received signal 
vector y k of dimension (M R x 1) is given by 

y k = H fc x fc + z fc , k = 1, . . . , L, (8) 

where x fc is the (M T x 1) vector of transmitted symbols, referred to as a compound symbol. Here, 
the entries of the random matrix H k are independent identically distributed (i.i.d.) Zero-Mean 
Circularly Symmetric Complex Gaussian (ZMCSCG) random variables. Thus, the channel state 
9 = H fc is distributed as H fc ~ (H) = CK(0, I Mt ® S H ) 



CW(0, I„ T ® Sh) = ^m^F ^ t " ^ ( HSH ~ lHt) 



(9) 



where Sh is the Hermitian covariance matrix of the columns of H (assumed to be the same for 
all columns), i.e., Sh = ct^Im^- The noise vector G C Mrx1 consists of ZMCSCG random 
vector with covariance matrix S = (t 2 z 1 Mr . Both H fc and z k are assumed ergodic and stationary 
random processes, and the channel matrix is independent of x^ and z k . 

B. Pilot Based Channel Estimation 

Assuming that the channel matrix is time-invariant over an entire frame, channel estimation 
is usually performed on the basis of known training (pilot) symbols transmitted at the beginning 
of each frame. The transmitter, before sending the data x fe , sends a training sequence of N 
vectors X r = (x T1 , . . . , x T A r). According to the observation of the channel model ©, this 
sequence is affected by the channel matrix H fc , allowing the receiver to observe separately 
Y T)k = H fc X T fc + Z T fc , where Z T k is the noise matrix affecting the transmission of training 
symbols. We assume that the coherence time is much longer than the training time and the 
average energy of the training symbols is Pt = jjj^tr (X T X^) . 
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We focus on the estimation of H fc , from the observed signals Y Tyk and X Tfc . In the ML 
sense this estimate is obtained by minimizing \\Yr,k — HjtXr|| 2 with respect to H^. This yields 
Hml.A; = Y Tj feXy (X T Xy) 1 = H fc + £fc, where £& = Z T)fc Xy (X r X^) 1 denotes the estimation 
error matrix. For simplicity, we assume orthogonal training sequences, for which we must have 
N > M T , and consequently the matrix error becomes decorrelated. Thus, matrix X T must 
be full rank M T and thus XjX^ must be nonsingular with orthogonal rows and such that 

X T X^ = NP t I Mt . Next, denoting £ the jth column of the error matrix £, we can write 

N P 

S £ = E£-[£ -£j} = SNR^ 1 !^ with SNR^ = — — , yielding a white error matrix, i.e. the 

°z 

entries of £ are i.i.d. ZMCSCG random variables with variance a\ = SNRy 1 . Thus, for each 
frame, the conditional pdf of 6 = H ML given 6 = H is the complex normal matrix pdf 

^ ml|h (H ml |H) = CX(H,I Mt ® S E ). (10) 

IV. Metric Computation and Iterative Decoding of BICM 

In this section, we specialize the expression © to derive the decoding metric for MIMO 
channels © and then we consider MIMO-BICM decoding with the derived metric. 

A. Mismatched ML Decoder 

The classical mismatched ML decoder consists of the likelihood function of the channel pdf 
using the channel estimate H ML . This leads to the following Euclidean distance 

T> ML (x,y|HML) = -logjy(y|x,H ML ) = ||y - H ML x|| 2 + const. (11) 

B. Metric Computation 

We now specialize the expression © in the case of a MIMO channel ([8]). To this end, we 
need to derive the pdf ^ H |^ ml (H|H M l), which can be obtained by using the pdf (flOl) and © 
(see Appendix lA)) . The corresponding pdf is: 

^ |5ml (H|H ml ) = C7K(S a Hml, 1m t ® SaS £ ), (12) 

July, 2007 DRAFT 



1 1 



SNR t ct^ 
SNRtct^ + 1 

(fl"2l) characterizing the CEE is the key feature of pilot assisted channel estimation. Then, by 



where Sa = S H (S £ +S H ) 1 = ^m r 8 an d 5 = — - — . The availability of the distribution 



averaging the channel VT(y|x, H) over all CEE, using the pdf (fT2l) . and after some algebra we 
obtain the composite channel (cf. Appendix [A]) 

W(y|x,H ML ) = CX(«5Hmlx,E + <5E £ ||x|| 2 ). (13) 

Finally, from (fT3l) the optimal decoding metric for the MIMO channel ([8]) reduces to: 

2)y°(x,y|H ML ) =M fl log(4 + ^l|x|| 2 )+ feSgf " (14) 

Z ' £ 1 1 1 1 

This metric coincides with that proposed for space-time decoding, from independent results in 
[2]. We note that under near perfect CSI, obtained when N — > oo, 



lim — — — ( ; yj_ ML ) _ ^ almost surely. (15) 
DML(x,y|H ML ) 



rpvMIMO / 



Consequently, we have the expected result that the metric (1141) tends to the classical mismatched 
ML decoding metric (fTTI) . when the estimation error o\ — > 0. 

C. Receiver Structure 

The problem of decoding MIMO-BICM has been addressed in [17] under the assumption of 
perfect CSIR. Here we consider the same problem with CEE, for which we use the metric (fT4l) 
in the iterative decoding process of BICM. Basically, the receiver consists of the combination 
of two sub-blocks operating successively. The block diagram of the transmitter and the receiver 
are shown in Fig. [Q and Fig. [2l respectively. The first sub-block, referred to as soft symbol to 
bit MIMO demapper, produces bit metrics (probabilities) from the input symbols and the second 
one is a soft-input soft-output (SISO) trellis decoder. Each sub-block can take advantage of 
the a posteriori (APP) provided by the other sub-block as an a priori information. Here, SISO 
decoding is performed using the well known forward-backward algorithm [18]. We recall the 
formulation of the soft MIMO detector. 
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Suppose first the case where the channel matrix H is perfectly known at the receiver. The 
MIMO demapper provides at its output the extrinsic probabilities on coded and interleaved bits 
d. Let dkj, j = 1, BM T , be the interleaved bits corresponding to the k-th compound symbol 
x fc G Q where the cardinality of Q is equal to 2 BMt . The extrinsic probability Pdem{d>k,j) °f the 
bit dkj (bit metrics) at the MIMO demapper output is calculated as 

BM T 

P dC m(d ktJ = 1) = K II p dec(4,i) exp [ - D(x fcj y fc |H fc )] , (16) 

x^.eQ i=i 
d kJ =l ift 

where D(xfc,yfc|H/-) = — log W(yk\^.k, H^) and K is the normalization factor satisfying 
Pdem{dk,j = 1) + Pdem(dk,j = 0) = 1 and Pdec(^A;,i) is the extrinsic information coming from 
the SISO decoder. The summation in (fl6l) is taken over the product of the channel likelihood 
given a compound symbol X&, and the a priori probability on this symbol (the term J"] Pdec) 
fed back from the SISO decoder at the previous iteration. Concerning this latter term, the a 
priori probability of the bit dkj itself has been excluded, so as to let the exchange of extrinsic 
information between the channel decoder and the MIMO demapper. Also, note that this term 
assumes independent coded bits d^u which is a valid approximation for random interleaving of 
large size. At the first iteration we set Pd ec (dk,i) = 1/2 (there is no a priori information). 

Note that by replacing the unknown channel in (fT6l) by its channel estimate H fc , we obtain the 
mismatched ML decoder (fTTI) . The proposed decoder follows by introducing the metric given 
by D^ IM0 (x fe , y fc |H fe ) in (fT6l) . yielding to the same equation with the appropriate constant K. 

V. Achievable Information Rates over MIMO Channels 

In this section we derive the achievable information rates in the sense of outage rates, 
associated to a receiver using the decoding rule © based on metrics (fl4l) and (fTTI) . 

A. Achievable Information Rates Associated to the Improved Decoder 

Assume a given pair of matrices (H,H), characterizing a specific instance of the channel 
realization and its estimate. We first derive the instantaneous achievable rates C^ IMO (H, H) 
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for MEMO channels W(y|x, H) = C7\T(Hx, £ )> associated to a receiver using the derived 
metric (fl4l) . This is done by using the following Theorem from [19], which provides the general 
expression for the maximal achievable rate with a given decoding metric. 

Theorem 5.1: For any pair of matrices (H, H), the maximal achievable rate associated to a 
receiver using a metric D(x, y|H) is given by 

Cd(H,H)= sup inf I(P Xt V Y \x), (IV) 

Px^y(X) W|jceV(H,H) 

where the mutual information functional 

I(Px,V Y]x )= f[\og 2 ^'ff'^Jp ? . dP x (x)dV Y]x (y\x,r), (18) 

J V Y \ x {y\x',T)dP x {x') 

and V(H, H) denotes the set of test channels, i.e., all possibles uncorrelated MEMO channels 
W| X (y|x, T) = CX(Tx, £), verifying thafl 

(ci) : tr(E P {E K {yyt}}) = tr(E P {E w {yyt}}), 

(c 2 ) :E P {E y {2)(x,y|H)}} <E P {E^{2)(x,y|H)}}. 
In order to solve the constrained minimization problem in Theorem (15.11 ) for our metric 
D = Djvc (expression (O), we must find the channel T G C MrxAIt and the covariance matrix 
S = lAfflO" 2 defining the test channel Vyix(y|x, T) that minimizes the relative entropy (TT81) . On 
the other hand, through this paper we assume that the transmitter does not dispose of the channel 
estimates, and consequently no power control is possible. Thus, we choose the sub-optimal input 
distribution P x = CN(0, £p) with Sp = Im t P- We first compute the constraint set V(H, H), 
given by (ci) and (c 2 ), and then we factorize matrix H to solve the minimization problem. 
Before this, to compute the constraint (02), we need the following result (Appendix |B]). 

'Our constraint (ci) is different of that provided in [19], since here the channel noise is i.i.d. and consequently we can only 
satisfy the equality of the matrix traces and not of the covariance matrices. 
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Lemma 5.2: Let A G C MrXMt be an arbitrary matrix and X be a random vector with pdf 
CN(0, Sp). For every real positive constants K\,K 2 > 0, the following equality holds 



E-, 



|AX|| 2 + ATi 
||X|| 2 + K 2 



II All 2 
II a \\f 

n + 1 



Ki \\A\\ 2 F 
K 2 n + 



|)(f)"-p(f)r(-n,^/P), (19) 
(-1) 



n-1 



r(o,t)-ex P (-t)^(-ir— 



i=0 



where n = Mt — 1 with n G N + and T(—n,t) 

/+oo 
m _1 exp(— u)du denotes the exponential integral function. 

From Lemma 15.21 and some algebra, it is not difficult to show that the constraints require that 



(ci) : ^(TSp^ + S) =tr(HSpH t + S ) : 
(c 2 ) : ||T + a>tH|||, < ||H + a^H||^ + C, 



(20) 
(21) 



a M = 8{5<r\P - X n a 2 z )[M T 5al\ n P + A n af - 5<x\P] \ 



2 

C = M T A„[||H||J,-||T||J, + p- 1 ((r(So)-*r(E))][l-j^A„-AfrA„] _1 , 



I 



2 \ n / 2 \ / „2 



X n = [—5] exp ( ) r ( -n, T Jh i ) , with n = M T - 1. 



From expression (f2Tb and computing the relative entropy, the minimization in (flTl) writes 

{min log 2 det (I Mr + TSpTtS^ 1 ) , 
subject to ||T + ajvcHHI < ||H + axH||^ + C, 

where S must be chosen such that tr(TS P T^ + S) = tr(HS P H^ + S ). In order to obtain 
a simpler and more tractable expression of (1221) . we consider the following decomposition of 
the matrix H = Udiag(A)VT with A = (Ai, . . . , Xm r ) t ■ Let diag(/i) be a diagonal matrix such 
that diag(/i) = U^TV, whose diagonal values are given by the vector fi = (/ii, . . . , hm r ) t - 
We define = V^H^U, the vector = diag(H^) T resulting of its diagonal and let fejvt = 
llH+ajvtHH^— a^(||H|||i— ||h|| 2 ). Using the above definitions and some algebra, the optimization 
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221) becomes equivalent to 



mm ^log 2 ^l + ^p-j 



C™ MO (H,H) = { H " \ ^IW/ (23) 



subject to + ajvth|| < 6 



with cr 2 (yu) = ]yj-(||A|| 2 — ||/i|| 2 ) + erf. The constraint set in the minimization ([231 . which 
corresponds to the set of vectors {/i G C MtX1 : ||/i+a]vth|| 2 < b^}, is a closed convex polyhedral 
set. Thus, the infimun in (|23T) is attainable at the extremal of the set given by the equality (cf. 
[20]). Furthermore, for every vector /i such that ||/i|| 2 < ||A|| 2 , we observe that expression (1231) is 
a monotonically increasing function of the square norm of yU. As a consequence, it is sufficient to 
find the optimal vector by minimizing the square norm over the constraint set. This becomes 
a classical minimization problem that can be easily solved by using Lagrange multipliers. The 
corresponding achievable rates are then presented in the following corollary. 

Corollary 5.3: Given a pair of matrices (H, H) the following information rates can be 
achieved by a receiver using the decoding rule © based on the metric (1T41) . for uncorrelated 
MIMO channels, 

C™(H, H) = log 2 det (l MR + T opt £ P T^- V^)) , (24) 
where the optimal solution T opt = Udiag(yU ( ^ t )V t with 



and a 2 (^) = 4(IIA|| 2 -K t H 2 




if b M > 0, 
otherwise, 



£S£=< VH h ll J (25) 



B. Achievable Information Rates Associated to the Mismatched ML decoder 

Next, we aim at comparing the achievable rates obtained in (1241) to those provided by the 
classical mismatched ML decoder (1TT|) . Following the same steps as above, we can compute 
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the achievable rates associated to the mismatched ML decoder. In this case, the minimization 
problem writes 

{min logo det (l M „ + TSpTtS" 1 ) , 
subject to Me{tr(HSpH t )} < Re{tr(TS P H t )}, 
where S must be chosen such that ir(TS P Tl" + S) = tr(HS P H^ + E ). The resulting 
achievable rates are given by 

G™°(H, H) = log 2 det (l MR + T opl EpTt p ,(T- 2 (^)) , (27) 

where T op , = Udiag^JV* and 

= £(IIAf-lliCII 2 )+-I, 

= MMA»h)}- 

llhll 2 

C. Estimation-Induced Outage Rates 

Through this section, we have so far considered instantaneous achievable rates over MIMO 
(1241) channels. We now provided its associated outage rates, according to the notion of EIO 
capacity defined in section HI-Bl In order to compute these outage rates, it is necessary to 
calculate the outage probability as a function of the outage rate. Given outage rate R > and 
channel estimate H, the outage probability is defined as 

P™ t (R,K)= f # H|fi (H|H), 

7{HeC M fl xM T:C* M (H,H)<_R) 

then the maximal outage rate for an outage probability 7 QoS is given by 

(7 QoS , H) = sup {R > : P^(R, H) < 7qoS }. (29) 

R 

Since this outage rate still depends on the channel estimate, we consider the average over 
all channel estimates as C^^q^) = Ejj{C^ r lt (7 (?oS , H)}. These achievable rates are upper 
bounded by the mean outage rates given by the EIO capacity, which provides the maximal 
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outage rate (i.e. maximizing over all possible receiver using the channel estimates), achieved by 
a theoretical decoder. In our case, this capacity is given by C{^ QoS ) = Ejj{C(7 QoS , H)}, where 
C(7 QoS , H) can be computed from © by setting 6 — H and 6 = H. 

VI. Simulation Results 

In this section we provide numerical results to analyze the performance of a receiver using the 
decoder © based on the metric (fl~4"l) . We consider uncorrected Rayleigh fading MIMO channels, 
assuming that the channel changes for each compound symbol inside a frame of L = 50 symbols. 
This assumption was made because of BICM for interleaver efficiency. The performances are 
measured in terms of BER and achievable outage rates. The binary information data is encoded 
by a rate 1/2 non-recursive non-systematic convolutional (NRNSC) channel code with constraint 
length 3 defined in octal form by (5, 7). The interleaver is random and operates over the entire 
frame with size LMrlog 2 (-B) bits. The symbols belong to a 16-QAM constellation with either 
Gray or set-partition labeling. Besides, it is assumed that the average pilot symbol energy is 
equal to the average data symbol energy. 

A. Bit Error Rate Analysis of BICM Decoding Under Imperfect Channel Estimation 

Here, we compare BER performances between the proposed decoder (fl4)) and the mismatched 
decoder CLI]) for BICM decoding (section IV). Fig. [3] and 0] show, for a 2 x 2 MIMO channel 
{Mt = Mr = 2), the increase in the required E^/Nq caused by decoding with the mismatched 
ML decoder in presence of CEE. BER obtained with perfect CSIR are also presented for 
comparison purpose. In this case, we insert N = 2, 4 or 8 pilots per frame for channel training. 
At BER = 1CT 4 and N = 2, we observe about 1.4 dB of SNR gain with set-partition labeling by 
using the proposed decoder. The performance improvement with set-partition labeling is higher 
(well served to iterative decoding) than Gray labeling (this is preferred if no iteration is allowed). 

We also note that the performance loss of the mismatched receiver with respect to our receiver 
becomes insignificant for N > 8. This can be explained from (fl3T) . since by increasing the number 
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of pilot symbols both decoders coincide. Results show that the decoder under investigation 
outperforms the mismatched decoder, especially when few are dedicated for training. 

B. Achievable Outage Rates Using the Derived Metric 

Numerical results concerning achievable information rates decoding with the investigated 
metric over fading MIMO channels are based on Monte Carlo simulations. 

Fig. \5\ compares average outage rates (in bits per channel use) over all channel estimates, of 
both mismatched ML decoding (given by expression (T27l) ) and the proposed metric (given by 
(1241) ) versus the SNR. The 2x2 MIMO channel is estimated by sending N = 2 pilot symbols 
per frame, and the outage probability has been set to 7 QoS = 0.01. For comparison, we also 
display the upper bound of these rates given by the EIO capacity (obtained by evaluating the 
expression ©), and the capacity with perfect channel knowledge. It can be observed that the 
achievable rate using the mismatched ML decoding is about 5 dB (at a mean outage rate of 6 
bits) of SNR far from the EIO capacity. Whereas, we note that the proposed decoder achieves 
higher rates for any SNR values and decreases by about 1.5 dB the aforementioned SNR gap. 

Similar plots are shown in Fig. [6] in the case of a 4 x 4 MIMO channel estimated by sending 
training sequences of length N = 4. Again, it can be observed that the modified decoder achieves 
higher rates than the mismatched decoder. However, we note that the performance degradation 
using the mismatched decoder has decreased to less than 1 dB (at a mean outage rate of 10 bits). 
This observation is a consequence of using orthogonal training sequences that requires iV > M T 
(CEE are reduced by increasing the number of antennas [21]). Whereas for N < M T (using 
non-orthogonal sequences) the performance degradation will be larger than here. 

Note that the achievable rates of the proposed decoder are still about 3 dB far from the ultimate 
performance given by the EIO capacity. However, the new metric provides significative gains in 
terms of information rates compared to the classical mismatch approach. 
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VII. Summary 

This paper studied the problem of reception in practical communication systems, when the 
receiver has only access to noisy estimates of the channel and these estimates are not available 
at the transmitter. Specifically, we focused on determining the optimal decoder that achieves the 
EIO capacity of arbitrary memoryless channels under imperfect channel estimation. By using the 
tools of information theory, we derived a practical decoding metric that minimizes the average 
of the transmission error probability over all CEE. This decoder is not optimal in the sense that 
it cannot achieve the EIO capacity, but it offers improvement performance without introducing 
any additional decoding complexity. 

By using the general decoder, we analyzed the case of uncorrelated fading MIMO channels 
with ML channel estimation at the decoder and without channel information at the transmitter. 
Then, we used this metric for iterative BICM decoding of MIMO systems. Moreover, we obtained 
the maximal achievable rates, using Gaussian codebooks, associated to the proposed decoder and 
compared these rates to those of the classical mismatched ML decoder. Simulation results indicate 
that mismatched ML decoding is sub-optimal under short training sequences, in terms of both 
BER and achievable outage rates, and confirmed the adequacy of the proposed decoder. 

Although we showed that the proposed decoder outperforms classical mismatched approaches, 
the derivation of a practical decoder that maximizes the EIO capacity (over all possible theoretical 
decoders) under imperfect channel estimation, is still an open problem in its full generality. 
Nevertheless, other types of decoding metrics incorporating also the outage probability value, 
have yet to be fully explored. 

Appendix 

A. Metric evaluation 

Theorem 1.1: Let H,; G C MrxMt (i = 1, 2) be circularly symmetric complex Gaussian random 
matrices with zero means and full-rank Hermitian covariance matrices = E{(H)j(H)t} of 
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the columns (H), of Hj (assumed to be the same for all columns) for i — 1, 2. Then the random 
variable Hi|H 2 ~ C?\f(^, Im t ® S) is a circularly symmetric complex Gaussian with mean 
/i = S 12 S^ 2 1 H 2 and covariance matrix of its columns E = E n — S 12 S 22 1 S 2 i. 

From © and (flOl) . by choosing En = E i2 = E H and E 22 = E H + E £ in Theorem 
11-11 we obtain the a posteriori pdf ^ H |g ML (H|H M L) = CH(EaHml, Im t <8> SaSe), where 
Sa = Eh(S £ + Eh)" 1 - In order to evaluate the general expression of the decoding metric CD) 
for fading MIMO channels, we compute the expectation of W(y|x, H) = CN(Hx, E ) over 
the pdf ^ H |g ML (H|H M L)- To this end, we need the following result (see [22]). 

Theorem 1.2: For a circularly symmetric complex random vector v ~ CN(/i, II) with mean 
li = E v {v} and covariance matrix II = Ev{vv^} — fj,^, and Hermitian positive definite matrix 
A such that I + TEA y 0, we have 



From this theorem, we can compute the composite channel W(y|x, H). Let us define v = y— Hx 
such that the conditional pdf of v given (H, x) is v|(H, x) ~ CN(fx, II) with /i = y — E A Hx 
and II = E A E £ ||x|| 2 . Thus, by defining A = Eq" 1 from (l30l) and after some algebra, we obtain 



B. Proof of Lemma 15.21 

Consider the quadratic expressions Qi(x) = ||Ax|| 2 + K\ and Q 2 (x) = ||x|| 2 + K 2 , where x 
is a vector of Mt elements, such that Qi,Q2 > almost surely. The joint generating function 
of Qi and Q 2 , namely, M QuQ2 (ti,t 2 ) = E x { exp (tiQi(x) + t 2 Q 2 (x))}. It easy to see that 




(30) 



W( 



y |x, H) = C^(<5Hx, E + 5E £ ||x|| 2 ) . 



M Ql ,Q,(ti,t 2 ) = exp {t x Kx + t 2 K 2 ) \l 



(iiA^A + i 2 )E P 



1/2 



(3D 



■M R ~ 



Then from the Gamma integral and setting t 2 = — z in (I3TI) we have 




(32) 



n 
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where it is not difficult to show that 



E x {Qi(x) exp[-zQ 2 (x)]} 



9M QliQa (t u -z) 



dti ii=o' 



[K x + 2- 1 tr(AS P A t )(l + zP)~ l ] 
x(l + 2j P)"( M -/ 2 )exp(-J^). 



(33) 



Finally, by solving the integral in (1321) , we obtain the expression (fl9i 
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Fig. 1. Block diagram of MIMO-BICM transmission scheme. 
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Fig. 2. Block digram of MIMO-BICM receiver. 
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2x2 MIMO, 16-QAM with Gray labeling, 4 decoding iterations 
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E b /N (dB) 

Fig. 3. BER performances over 2x2 MIMO with Rayleigh fading for various training sequence lengths and Gray labeling. 
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2x2 MIMO, 16-QAM with set-partiton labeling, 4 decoding iterations 
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Fig. 4. BER performances over 2x2 MIMO with Rayleigh fading for various training sequence lengths and set-partition 
labeling. 
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2x2 MIMO, outage probability y = 0.01 
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Fig. 5. Expected outage rates over 2x2 MIMO with Rayleigh fading versus SNR (N = 2). 
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4x4 MIMO, outage probability y = 0.01 
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Fig. 6. Expected outage rates over 4x4 MIMO with Rayleigh fading versus SNR (TV = 4). 
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